A Real-Time Scene Text to Speech System

نویسندگان

Lukas Neumann

Jiri Matas

چکیده

 The system is based on an efficient end-to-end real-time scene text localization and recognition method [1,2,3]  Individual characters detected as Class-Specific Extremal Regions (CSERs) [4]  An efficient sequential classifier selects only ERs with locally maximal probability p(region|character) with complexity linear in the number of image pixels  The stability requirement of MSERs [5] is dropped; the detector has a lower memory footprint and handles better blurred, noisy and low-contrast text  A novel sequential classifier exploits more computationally expensive features without a negative impact on performance  Recognized text from subsequent frames is aggregated and sent to speech synthesizer EVALUATION 1: ICDAR 2011 DATASET

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

MT3S: Mobile Turkish Scene Text-to-Speech System for the Visually Impaired

Reading text is one of the essential needs of the visually impaired people. We developed a mobile system that can read Turkish scene and book text, using a fast gradient-based multi-scale text detection algorithm for real-time operation and Tesseract OCR engine for character recognition. We evaluated the OCR accuracy and running time of our system on a new, publicly available mobile Turkish sce...

متن کامل

A 3d audio-visual animated agent for expressive conversational question answering

This paper reports on the ACQA (Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent (ACA) for conducting research along two main lines: 1/ perceptual experiments (eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head mo...

متن کامل

Audio-visual Analysis of Multimedia Documents for Automatic Topic Identification

This paper presents a system that shall automatically scan multimedia data like TV or radio broadcasts for the presence of specific topics and, whenever topics of users’ interests are detected, alert the related user. Our current work on the three main modules of the system will be shown. (1) The speech recognition system (with 18.7 % WER) is already among the most advanced German broadcast spe...

متن کامل

Automatic topic identification in multimedia broadcast data

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

A Real-Time Scene Text to Speech System

نویسندگان

چکیده

منابع مشابه

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

MT3S: Mobile Turkish Scene Text-to-Speech System for the Visually Impaired

A 3d audio-visual animated agent for expressive conversational question answering

Audio-visual Analysis of Multimedia Documents for Automatic Topic Identification

Automatic topic identification in multimedia broadcast data

عنوان ژورنال:

اشتراک گذاری